---
title: Manage data with the AI Catalog
dataset_name: N/A
description: How to import data to the AI Catalog and how to use the catalog to prepare, blend, and create a project from your data.
domain: platform
expiration_date: 10-10-2024
owner: izzy@datarobot.com
url: docs.datarobot.com/docs/tutorials/prep-learning-data/ai-catalog-tutorial.html

---

# Manage data with the AI Catalog {: #manage-data-with-ai-catalog }

DataRobot’s AI Catalog is comprised of three key functions:

* **Ingest**: Data is imported into DataRobot and sanitized for use throughout the platform.
* **Storage**: Reusable data assets are stored, accessed, and shared.
* **Data Preparation**: Clean, blend, transform, and enrich your data to maximize the effectiveness of your application.

You can access the AI Catalog from anywhere within DataRobot by clicking the **AI Catalog** tab at the top of the browser.


## Takeaways {: #takeaways }

This tutorial shows you how to:

* Add data to the AI Catalog.
* View information about a dataset.
* Blend a dataset with another dataset using Spark SQL.
* Create a project.

## Add data {: #add-data }

To add data to the AI Catalog:

1. Click **AI Catalog** at the top of DataRobot window.

2. Click **Add to catalog** and select an import method.

    ![](images/tu-ai-cat-add-to-catalog.png)

    The following table describes the methods:

    | Method | Description
    |---|---|
    | New Data Connection | [Configure a JBDC connection](data-conn) to import from an external database of data lake. |
    | Existing Data Connection | [Select a configured data source](import-to-dr#import-from-a-data-source) to import data. Select the account and the data you want to add. |
    | Local File | Browse to [upload a local dataset](import-to-dr#import-local-files) or [drag and drop a dataset](import-to-dr#drag-and-drop). |
    | URL | [Import by specifying a URL](import-to-dr#import-a-dataset-from-a-url).|
    | Spark SQL | Use [Spark SQL queries to select and prepare the data](catalog#use-a-sql-query) you want to store. |

DataRobot registers the data after performing an initial exploratory data analysis ([EDA1](eda-explained#eda1)). Once registered, you can do the following:

* [View information](#view-information-about-a-dataset) about a dataset, including its history.
* [Blend the dataset](#blend-a-dataset-using-spark-sql) with another dataset.
* [Create an AutoML project](#create-a-project).

## View information about a dataset {: #view-information-about-a-dataset }

Click a dataset in the catalog to view information about it.

![](images/tu-ai-cat-info.png)

| | Element | Description |
|---|---|---|
|  ![](images/icon-1.png) | Asset tabs | Select a tab to work with the asset (dataset): <ul><li>**Info**: View and edit basic information about the dataset. Update the name and description, and add tags to use for searches. </li><li>**Profile**: Preview dataset column names and row data. </li><li>**Feature Lists**: Create new feature lists and transformations from the dataset. </li><li>**Relationships**: View relationships configured during [Feature Discovery](feature-discovery/index).</li><li>**Version History**: List and view status for all versions of the dataset. Select a version to create a project or download.</li><li>**Comments**: Add a comment to a dataset. Tag users in your comment and DataRobot sends them an email notification. </li></ul> |
| ![](images/icon-2.png) | Dataset Info | Update the name and description, and add tags to use for searches. The number of rows and features display on the right, along with other details.
| ![](images/icon-3.png) | State badges | Displayed badges indicate the [state of the asset](catalog-asset#asset-states)&mdash;whether it's in the process of being registered, whether it's static or dynamic, generated from a Spark SQL query, or snapshotted.
| ![](images/icon-4.png) | Create project | [Create a machine learning project](#create-a-project) from the dataset.
| ![](images/icon-5.png) | Share | [Share assets](sharing) with other users, groups, and organizations.
| ![](images/icon-6.png) | actions menu | Download, delete, or create a snapshot of the dataset.
| ![](images/icon-7.png) | Renew Snapshot | Add a [scheduled snapshot](snapshot). |


## Blend a dataset using Spark SQL {: #blend-a-dataset-using-spark-sql }

You can blend two or more datasets and use Spark SQL to select and transform features.

1.  In the catalog, click **Add to catalog** and select **Spark SQL**.

    ![](images/tu-ai-cat-blend-add-to-catalog-spark.png)

2. Click **Add data**.

    ![](images/tu-ai-cat-blend-add-data-sql.png)

3. Select the tables you want to blend and click **Add selected data**.

    ![](images/tu-ai-cat-blend-select.png)

3. For each dataset, click the actions menu and click **Select Features**.

    ![](images/tu-ai-cat-blend-select-features.png)

4. Choose the features and click **Add selected features to SQL**. You can click the right arrows to add features one at a time.

    ![](images/tu-ai-cat-blend-select-sql.png)

4. Once you have added features from the datasets, add SQL commands to the editing window to generate a query (click **Spark Docs** on the upper right for Spark SQL documentation). Try out the query by clicking **Run**.

    ![](images/tu-ai-cat-blend-run-sql.png)

5. Click **Save** when you have the results you want. DataRobot registers the new dataset.

     ![](images/tu-ai-cat-blend-registering.png)

## Create a project {: #create-a-project }

Click a registered dataset in the catalog and click **Create project**. DataRobot uploads the data, conducts [exploratory data analysis](assess-data-quality-eda), and creates the machine learning project. You can then start [building models](model-data).

## Learn more {: #learn-more }

**Documentation:**

* [Dataset requirements](file-types)
* [Import using the AI Catalog](catalog)
* [Connect to data sources](data-conn)
* [Work with catalog assets](catalog-asset)
